Data Sets

Datasets were searched for using the same filters as before - must be human tissue that expressed lewy bodies, must have at least 3 patients and 3 controls, and must…

10 datasets were identified:

CodeID Tissue/Cell type Variant Samples Type of array GEO Ref Comments PIs
DIJ Substantia Nigra not specified 15 disease, 8 control Affymetrix Human Genome U133 Plus 2.0 Array GSE49036 Has some Braak staging Dijkstra AA, Ingrassia AI, de Menezes RX, van Kesteren RE, Rozemuller AM, Heutink P, vandeBerg WJ
FFR Substantia Nigra not specified 9 replicates for the controls and 16 replicates for the Parkinson’s disease patients Affymetrix Human Genome U133 Plus 2.0 Array GSE7621 Ffrench-Mullen JM
MOR.SN substantia nigra, split into medial and lateral portions sporadic 15 samples of medial parkinsonian SN, 9 samples of lateral parkinsonian SN (24), 8 medial nigra control samples and 7 lateral nigra control samples(15) Affymetrix Human Genome U133A Array GSE8397 Age and sex Moran LB, Graeber MB
MID1 Substantia Nigra pars compacta not specified 10 PD brain samples and 8 control Affymetrix Human Genome U133 Plus 2.0 Array GSE20141 Middleton FA, Kim PD, Zhang-James Y, Davis RL
MID2 whole substantia nigra not specified 18 controls, 11 patients Affymetrix Human Genome U133A Array GSE20292 Age and sex Middleton FA, James M, Zhang Y, Davis RL
DUM Cortex (BA9) not specified 29 PD, 44 neurologically normal controls Illumina HiSeq 2000 GSE68719 Have age of death, sex, and some samples have corresponding proteomics Dumitriu A, Golji J, Labadorf AT, Gao B, Beach TG, Myers RH, Longo KA, Latourelle JC
MOR.FC Frontal cortex sporadic 3 Controls 5 Patients Affymetrix Human Genome U133A Array GSE8397 Age and sex Moran LB, Graeber MB
LEW two regions of the medulla not specified 14 PD samples and 8 controls (two regions) Affymetrix Human Genome U133A 2.0 Array GSE19587 Age and sex lewandowski nm, small sa
MID3 prefrontal cortex not specified 15 controls 14 patients Affymetrix Human Genome U133A Array GSE20168 Age and sex Middleton FA, James M, Zhang Y, Davis RL
MID4 putamen not specified 20 controls and 15 patients Affymetrix Human Genome U133A Array GSE20291 Age and sex Middleton FA, James M, Zhang Y, Davis RL

Quality Control - PD_QC.R

Checked data distribution using PCA and boxplots. Some of the datasets have quite mixed samples wereas others separate quite nicely into controls and patients. Boxplots didn’t suggest any samples were outliers.

Differential Gene Expression

As with the TDP-43 data, microarray data was analysed using Limma. The genes were divided into positive and negative fold change. Intersected gene lists can be found in /Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/FoldChange.

Commonly Upregulated Genes

~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~
A4GNT C3 CPED1 FAM46A HCK KDM4A LY6G5C NRDE2 RBMS2 SMIM14 TNFRSF10B
ABCC6 C3AR1 CPM FAM60A HDC KDM4C LYVE1 NYX RCC1 SNAP23 TNFRSF10D
AC012065.7 C8orf60 CPSF3L FANCE HERC2P3 KIAA0101 LYZL6 OR1F1 RDH16 SND1-IT1 TNFRSF25
ACP5 C9orf3 CROCC FAS HIST1H1T KIAA0226L MAF ORM1 RHBDF2 SOGA1 TNFSF14
ACSBG2 CALML4 CROCCP3 FASLG HIST1H2AB KIAA0485 MAFF OSM RIBC2 SORBS1 TNN
ACSS3 CASP1 CSPG4 FCGBP HIST1H2AD KIAA1614 MAP2K7 P4HA1 RIPK2 SOS2 TP73-AS1
ADH1B CASP2 CTBP2 FCGR2A HMOX1 KIAA1661 MAPKBP1 PAGE1 RNASE6 SOX12 TRAF1
ADIPOQ CASP4 CTD-2269F5.1 FCGR2B HNRNPM KIF13A MAVS PAPOLA RP11-217B7.2 SOX2 TRAPPC10
ADORA3 CASP6 CXCR4 FCRL2 HOXB8 KLHL36 MBD5 PARP16 RP11-255C15.3 SOX9 TRIM21
AF007147 CASP7 CXorf21 FGF17 HP1BP3 KLK2 MBTD1 PAWR RP11-403P17.4 SP140L TRIM5
AGAP10 CASP8 CXorf36 FGF3 HS3ST1 KRI1 MCM3AP-AS1 PCDH12 RP11-548H18.2 SPG21 TTLL5
AGRP CATSPER2 CYP21A1P FGFR1 HSPA1A KRT19P2 MECOM PCDHGA8 RPA4 SPINK1 TUBD1
ALOX5 CCDC101 CYP39A1 FKBP10 HSPA1L L2HGDH MED13L PDLIM1 RPL23AP32 SPTLC3 TULP2
ALOX5AP CCDC15 CYTH4 FLCN HSPA6 LAG3 METTL4 PELI2 RPL23AP53 SRP19 TYMP
ALPK1 CCDC40 DAPP1 FLJ21369 HSPD1 LAMB2 METTL7A PGF RREB1 SRSF5 TYROBP
AMELX CCL17 DBF4B FLT1 IFNA2 LAT2 MFAP5 PGPEP1 RRNAD1 SSTR3 UGGT1
AMELY CCL22 DCLRE1C FLT3LG IFNA21 LECT1 MICB PIDD1 RUNX1-IT1 ST20 UPK3B
AMH CCL27 DDR1-AS1 FMO5 IGF1R LEF1 MID1 PIGO RUNX3 ST6GAL1 USP34
ANGPT2 CD14 DENND2D FNDC3B IGFBP5 LENEP MILR1 PILRA S100A11P1 STAG3L3 VPS54
ANP32A-IT1 CD163 DFFB FOXD1 IGKV1-17 LEPREL1 MIR612 PIM1 S100A4 STIP1 VSIG4
AP1G2 CD207 DIP2A FOXL2 IGLL3P LEPREL4 MKRN4P PLAC8 SAFB2 STK10 WBSCR16
APOBEC3C CD22 DISC1 FPR1 IGSF9B LGALS9 MS4A6A PLEKHF2 SAT1 STOM WDR4
APOC2 CD247 DKFZP434A062 FYB IL12RB1 LILRA2 MSH5 PLGLB1 SCAF4 TAF4 WDR52
AQP8 CD300A DMC1 FZD5 IL16 LINC00115 MSX1 PLIN2 SCD5 TAL1 WDR55
ARHGAP25 CD37 DNA2 FZD7 IL17RB LINC00260 MT1M PLK3 SCIN TAS2R10 WDR78
ARHGDIB CD48 DNAJB1 FZD9 IL18 LINC00894 MYH7B PLOD2 SCNN1B TAS2R13 XRCC2
ARHGEF4 CD68 DNAJB6 G0S2 IL1R1 LINC00963 MYO1C PLTP SDCCAG3 TAZ ZBED2
ARID3A CD84 DNASE2 G3BP1 IL21 LLGL2 MZT2B PODNL1 SERPINA1 TBX6 ZC3HAV1
ATAD2B CDA DOCK10 GABPA INE1 LOC100272216 NABP1 POGLUT1 SERPINH1 TBXAS1 ZFP36L1
ATHL1 CDC14A DOCK2 GAL3ST4 INPP5B LOC100506699 NACAP1 POGZ SERTAD3 TCF3 ZMYND8
ATP8B1 CDK2AP2 DSE GAS1 INPP5D LOC101928198 NBEAL2 PPP1R14D SH2B2 TFAP2B ZNF14
ATXN7 CEBPD DTYMK GATA1 INVS LOC101928274 NBR2 PPP2R1B SHMT1 TFEC ZNF217
AVIL CEL EEF1DP5 GCKR IRF4 LOC101929240 NCKAP1L PRR11 SIGLEC7 TFPI ZNF224
AZGP1 CHD2 EFNA4 GDF15 IRF6 LOC101929889 NEUROG1 PRRG4 SIGLEC8 TGFBR3 ZNF235
AZGP1P1 CHKB-CPT1B EGF GDF9 IRF7 LOC102724229 NFATC3 PSCA SIM2 TGIF1 ZNF34
AZU1 CHRNG ELF4 GEM ITFG2 LOC102725016 NFE2 PSG11 SIX6 TIMM44 ZNF354A
BANP CHST4 EMC10 GJA9-MYCBP ITGB6 LOC284009 NHLH1 PTPN14 SLA TLR5 ZNF430
BC069804 CLEC2B ENTPD1 GJD2 ITPR3 LOC729164 NOTCH2NL PTPRCAP SLAMF8 TM4SF1 ZNF446
BMP8B CLIC2 ENTPD7 GK2 JMJD6 LOC79999 NOX3 RAB13 SLC11A1 TMEM176A ZNF516
BRPF1 CLK1 EP300 GP9 KCNE4 LPAR2 NPIPA1 RAC2 SLC27A3 TMEM19 ZNF665
BTBD18 COL16A1 EPOR GPR25 KCNJ8 LPAR4 NPIPB15 RAD52 SLC2A5 TMEM51 ZNF670
C10orf12 COL5A1 ETV1 GRAMD3 KCNK5 LPP NPL RAD54L SLC35C2 TMPRSS11E ZNF710
C11orf16 COL7A1 FAM106A GTSE1 KCNQ1 LRRC1 NPR3 RBL1 SLC7A9 TMPRSS5 ZNF721
C1QB COL9A1 FAM118A GUSBP3 KCTD5 LRRFIP1 NR6A1 RBM38 SMAD6 TNFAIP8 ZNF74
ZNF783

Commonly Downregulated Genes

~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~ ~~~
ABCA11P ATP5D CLINT1 EIF2B3 GPN1 LOC728392 MZT2A PFDN2 PSMD8 SLC9A1 TSSC1
ABCF1 ATP5F1 CLIP3 EIF2S1 GPX4 LRBA NAPA PFKM PSME3 SLITRK5 TTC19
ABHD11 ATP5G1 CLTB EIF3F GRAMD1B LRPPRC NDEL1 PFN2 PSMG1 SNCA TTC9
ACAT1 ATP6AP1 CMAS EIF3I GSS LYRM4 NDFIP1 PGAM1 PTBP2 SNRPN TUBA1B
ACAT2 ATP6V0B COA3 EIF3K GSTA4 LYRM9 NDUFA10 PGRMC1 PTDSS1 SNX24 TUBA1C
ACP2 ATP6V0C COA7 EIF6 GSTO1 MAGED1 NDUFA13 PHB2 PTP4A1 SORD TUBB
ACTB ATP6V1E1 COMMD9 ELOVL6 HARS MAGED2 NDUFA3 PINK1 PTPRA SPAG16 TUBB2A
ACTR1A ATP6V1F COPS3 ELP3 HERC6 MAK16 NDUFA7 PITPNA RAB14 SPAG7 TUBB3
ADAM23 ATP6V1G2 COPS4 EMC7 HINT1 MAP1B NDUFA9 PITPNB RABEPK SPCS1 TUBG1
AFG3L2 ATP6V1H COPS5 ENDOD1 HLTF MAP1LC3B NDUFAB1 PITRM1 RAN SPCS3 TUFM
AGTPBP1 AURKAIP1 COPS7A ENOPH1 HMCES MAPK10 NDUFAF1 PKI55 RANGRF SPRYD7 TUSC2
AHCY AVPI1 COQ3 ENSA HMGN4 MDH1 NDUFB5 PLD3 RARS SRPRB TXNDC15
AIFM1 B3GNT2 COX7A2L ENTPD6 HNRNPA0 MDH2 NDUFS1 PLEKHB2 RBFOX2 SRRD UBB
AIMP2 BABAM1 COX8A ERCC1 HPCAL4 ME2 NDUFS3 PMPCA RBM3 SSSCA1 UBE2E3
AJAP1 BECN1 CREB3 ERP29 HSPA12A ME3 NDUFS5 PMS2 RITA1 ST3GAL5 UBE3B
AK055981 BEND5 CRMP1 EXOSC9 IARS MECR NDUFV1 PMS2P1 RNASEH1 STAT4 UCHL1
AK5 BEX4 CRY2 FABP3 IARS2 MFN2 NDUFV2 PMS2P5 RNF10 STRAP UQCRC1
AKAP12 BFSP1 CSNK2A1 FAF1 IDH3B MGST3 NEDD8 POLR2C RPA2 STS UQCRFS1
AKAP6 BPGM CSRNP2 FAM127A IDH3G MICU1 NEU1 POLR3C RPH3A STX12 UQCRQ
AKTIP BRE CTNNA2 FAM134C IDI1 MIEF1 NGFRAP1 POP4 RPP14 STXBP1 UROD
ALAS1 BRF2 CUL2 FAM162A IMP4 MIF NHP2 PPFIA2 RPS6KC1 SUMO3 UROS
ALDH1A1 BSN CX3CL1 FAM168A INSIG1 MIR21 NIF3L1 PPIA RRAGA SYN1 USP7
ANXA2 BTBD3 CXorf40A FAM206A IQSEC1 MIR3656 NIPSNAP1 PPME1 RTCB SYT11 UTP18
ANXA6 C11orf49 CXorf40A FBXW2 IRGQ MIR4784 NIT2 PPP1R7 RTF1 TACO1 VAMP7
AP1S1 C12orf10 CXorf40B FDPS ITFG1 MIR636 NME1 PPP2CA RTN4 TCEA2 VLDLR
AP2M1 C1orf216 CYC1 FECH ITPA MIR6890 NOP16 PPP2R1A RWDD2B TCEAL2 VPS41
AP2S1 C21orf33 CYP2E1 FHIT ITPR1 MLH1 NRXN3 PPP2R2B SAMM50 TCEAL4 VPS4B
AP3M2 C3orf62 DCTN3 FIBP KATNB1 MMP24-AS1 NSDHL PPP2R5D SARS TFCP2 VPS51
AP3S2 C5orf30 DCTN6 FIG4 KHDRBS1 MOCS2 NUDCD3 PPP3CB SCAMP5 TFPT WBP11
APEH C6orf106 DCTPP1 FKBP1B KIAA0391 MOSPD1 NUDT2 PRC1 SCFD1 TIMM10 WDR44
APEX1 CCBL2 DDA1 FLRT1 KIAA0513 MPI NUP155 PRKCZ SDHA TIMM10B WDR61
APLP2 CCDC51 DDHD2 FRMPD4 KIAA1279 MRPL15 ODC1 PRMT8 SDHAF1 TIMM8B WDR7
APMAP CCK DDX1 FTO KIF2A MRPL35 ODF2 PRPF4 SDHAP1 TMED3 WDR77
AREL1 CCNH DDX24 FUCA1 KIF3C MRPL4 OPTN PRR13 SEC31A TMEM177 WDYHV1
ARF1 CCSER2 DDX42 GABARAPL1 KIFAP3 MRPS18A ORC5 PSD3 SEH1L TMEM208 YWHAZ
ARF3 CDC123 DGUOK GABBR1 KLHDC4 MRPS22 OSBP PSMA1 SERINC3 TMEM246 ZDHHC4
ARFGAP2 CDC42 DHRS11 GALNT11 L1CAM MRPS33 OXCT1 PSMA5 SEZ6L2 TMEM41B ZDHHC6
ARHGEF9 CDK20 DHRS7B GARS LANCL1 MRPS35 OXLD1 PSMB3 SF3A3 TMEM97 ZMYM4
ARL2BP CDK5 DIABLO GBAS LANCL2 MRPS7 PAAF1 PSMB4 SF3B5 TMUB2 ZNF365
ARMC2-AS1 CDK7 DKK3 GGNBP2 LBH MRTO4 PAFAH1B1 PSMB5 SH3BP5 TOMM20 ZNF593
ARPC1A CDS2 DLD GLO1 LCMT1 MTCH1 PARL PSMB6 SIPA1L1 TOR1AIP2 ZNF629
ARPC5L CERK DNAJC8 GLOD4 LDHA MTCH2 PARN PSMB7 SLC23A2 TOX4
ASS1 CFL1 DRG1 GLT8D1 LDHB MTCL1 PCCB PSMC1 SLC25A11 TP53BP1
ASTN2 CHCHD2 DSTN GMPR2 LDLRAD4 MTHFD1 PCYOX1L PSMC2 SLC25A3 TPGS2
ATG14 CHP1 DUSP26 GNB1 LDOC1 MTPAP PDCL3 PSMC5 SLC25A46 TRAP1
ATP1A1 CHST1 DYNC1H1 GNB5 LETMD1 MTX2 PDHB PSMD1 SLC25A5 TRAPPC2L
ATP1A3 CIAPIN1 DYNC1I1 GOT1 LINC00094 MX1 PDK2 PSMD11 SLC25A6 TSG101
ATP2B2 CIRBP DZIP3 GOT2 LMO3 MYH10 PDZD8 PSMD14 SLC30A9 TSPAN3
ATP5A1 CITED1 EAPP GPD1L LOC101927673 MYL12B PEX14 PSMD2 SLC35B1 TSPAN7
ATP5B CLCN6 EIF2AK1 GPI LOC645166 MYL5 PEX19 PSMD7 SLC6A1 TSR2

Mouse data

Mouse data was sourced from GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4758, referenced as CON, however during the analysis I realised that alpha synuclein had a positive log fold change, even though every other dataset reported a negative log fold change for sporadic parkinson’s disease. Looking back at the dataset I realised it was an overexpression model rather than a mutant. Woops!

Alpha synuclein expression

What’s interesting is that canonically, alpha synuclein has been suggested to be overexpressed in sporadic parkinson’s patients.

Filtering out other signals

As was done with the TDP-43 pipeline, I had to find other signals that could be contaminating the sporadic signal. I decided to use the ALS datasets from the TDP-43 analysis to represent “general neurodegeneration”, (C9orf72, sALS, PET, RAV, SOD1, FUS) but actually there was very little overlap:

Common Up (ALS)

ALOX5 ALPK1 STOM MAFF COL7A1

Common Down (ALS)

PMS2P1 ATP6V1G2 INSIG1 RRAGA TPGS2 MTCH2 PPP2CA GLO1 PDHB

In a similar vein I decided to source some sporadic Parkinson’s disease blood - to make sure it was leaving a more tissue specific signal. I found two datasets on GEO - GSE99039 (AMA) and GSE72267 (RON). These were bigger datasets than I usually use - AMA was total 353 samples, RON 59. What I realised was that my filters for the presence calls were still 2. This meant that 24,000 genes/probes passed through filtering which was making huge overlaps. I realised a filter of 2 wasn’t really proportional to the number of samples in AMA, so I increased this to 20 (as the dataset is approximately 10 times larger than the other datasets I have been using).

These two datasets produced a common gene expression signature of approximately 5000 genes, which is quite a lot of concordance. When you compare this to the brain signal, there is an overlap of 88 upregulated and 76 downregulated genes. This is quite a small overlap considering the size of the blood signature.

At this point, the remaining gene list for the sporadic signature is 558 genes (look for Filt_ALS_blood files)

Enrichr

Down http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3rwmj

Up http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3rwmr

CON DATA

There was a dataset I labelled “CON” with the GSEnumber GSE4758 that I included for a little while as it was a mouse model of alpha synuclein which I later removed as the research question changed.

Adding LRRK2 Data

Oliver Bandmann mentioned that there is a phenomenon that LRRK2 patients exhibit their disease more closely to sporadic patients than other familial patients. I thought this was interesting and could be a good application of my methdology.

I sourced two datasets with LRRK2 patient data - GSE34516 and GSE23290. These were exon array so had to first be preprocessed by Wenbin in the Affymetrix Console using RMA-sketch. The output was normalised expression data. Mas5Calls didn’t seem to work so I skipped this step.

#### BOT GSE34516 #########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/")
Data <- read.csv("GSE34516.RMA-GENE-EXTENDED-EC-hg19-na36_LRRK2only.csv", row.names = 1)

genename <- as.data.frame(Data$GeneSymbol)
rownames(genename) <- rownames(Data)
colnames(genename) <- "Gene.Symbol"
#Data is already normalised using RMA-sketch
analysis.name<-"BOT" #Label analysis
expressionMatrix<-Data[,1:6] #takes expression from normalised expression set


Treat<-factor(rep(c("Control", "Patient"),c(4,2)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design

#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit) 
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))


result$"Fold Change"<-2^result$logFC 
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by = 0) #merge values into one array
result<-merge(result, genename, by.x = "Row.names", by.y = 0)
result<-subset(result, subset=(Gene.Symbol !="---")) #if no gene symbol, discount


setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)

genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,16]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)

#### BOT2 GSE23290 #########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/")
Data <- read.csv("GSE23290.RMA-GENE-EXTENDED-EC-hg19-na36_LRRK2only.csv", row.names = 1)

genename <- as.data.frame(Data$GeneSymbol)
rownames(genename) <- rownames(Data)
colnames(genename) <- "Gene.Symbol"
#Data is already normalised using RMA-sketch
analysis.name<-"BOT2" #Label analysis
expressionMatrix<-Data[,1:8] #takes expression from normalised expression set


Treat<-factor(rep(c("Control", "Patient"),c(5,3)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design

#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit) 
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))


result$"Fold Change"<-2^result$logFC 
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by = 0) #merge values into one array
result<-merge(result, genename, by.x = "Row.names", by.y = 0)
result<-subset(result, subset=(Gene.Symbol !="---")) #if no gene symbol, discount


setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)

genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,18]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)

I had to skip out a number of steps because it didn’t require the annotation file. There were about 28000 rows left over which was quite a lot, though approximately 7000 were LOC transcripts so took us down to the number of genes you would expect.

After adding these two datasets to the FoldChange_PD.R script, and filtering out for ALS and sporadic PD blood, there were 85 upregulated and 109 downregulated genes. This still included MSX1, SNCA, UCHL1, and ALDH1A1 from PD Malacards but not PINK1 anymore.

EnrichR UP: http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3t9ff DOWN: http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3t9fh

Change of direction

The other datasets I had were fibroblasts from PARK2 and LRRK2 patients as provided by Robin Highly/Oliver Bandmann/Heather Mortiboys. These were exon array datasets which were analysed by Wenbin using the Affymetrix console. When these were added to the filters, the number of common genes was 36, only one of which MSX1 was retained.

To give a little more signal, but to also see if the signal could be recovered, I discluded the blood data (as fibroblast is representative enough of a non-neuronal tissue), leaving 54 genes

DOCK2 RAD52 TNFAIP8 P4HA1 MS4A6A CD300A KLHL36 TRIM5 CD163 CXCR4 RIPK2 PAWR ZNF516 JMJD6 CASP7 MSX1 IGF1R LRRC1 SAFB2 C1QB IL17RB LAMB2 CCDC40 CDA FMO5 DNASE2 NFE2 ZNF217 HMOX1 PARP16 HP1BP3 PGPEP1 CD14 NPIPB15 IMP4 LETMD1 AP1S1 FHIT MECR ATP2B2 DCTPP1 WDYHV1 PSMC2 NRXN3 MRTO4 NDUFS3 SLC6A1 NME1 MRPL15 COPS3 TUBG1 PRC1 BFSP1 IARS

Approximately 10 of these genes have a relatively strong relationship with Parkinson’s or LRRK2 specifically. 32 were directly related according to VarElect, 22 indirectly related (VarElect_Results_cmgreen1-sheffield-ac-uk-20180522-014621619).

The next step was to find the PPI partners of these 54 genes.

PPI

#### ALS PARK2 LRRK2 ####

library(biomaRt)

setwd("/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/PPI_Network/")

PPI <- read.table("iref14_Human_UP_noDup_table_nodash.txt", header = T)
braingenes <- read.csv("Zhang_BrainCelltype_Markers_braingenes.csv", header = T)

DEG_list <- readLines("~/Documents/PhD/Parkinsons/Parkinsons_Code/Results/ALSPARK2LRRK2/removeallALLgenes.txt")

mart <- useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", host="www.ensembl.org")
attributes <- listAttributes(mart)
mart_back <- getBM(attributes =c("hgnc_symbol", "uniprotswissprot"), filters="hgnc_symbol", values=DEG_list,  mart=mart)
genelist_Uniprot <- subset(mart_back, !(mart_back$uniprotswissprot == ""))
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/ALSPARK2LRRK2/")
write.csv(genelist_Uniprot, "martback.csv", row.names = F)

# IDENTIFY MISSING GENES AND FIND UNIPROT CODES FOR THEM. NB SOME GENES MAY NOT BE PROTEIN CODING #

mart_table <- read.csv("martback.csv", header = T) #A table with the uniprot codes for the DEGs
uniprot_gene <- mart_table$uniprotswissprot

DEG_PPI <- subset(PPI, PPI$V1 %in% uniprot_gene | PPI$V2 %in% uniprot_gene)
rownames(DEG_PPI) <- 1:nrow(DEG_PPI)

write.csv(DEG_PPI, "DEG_PPI_ALP.csv", row.names = F)

DEG_PPI <- read.csv("DEG_PPI_ALP.csv")

DEG_PPI <- subset(DEG_PPI, DEG_PPI$Gene1 !="-")
DEG_PPI <- subset(DEG_PPI, DEG_PPI$Gene2 !="-")

write.csv(DEG_PPI, "FinalPDPPI.csv", row.names = F, quote = F)

From this we have a network of 1367 interactions between 1034 proteins (file = PD_DEGPPI.R, DEG_PPI_ALP.csv, ALP_PPIgenes.txt. Gene names used).

Next I had to find the corresonding rows in each of the RNA expression data files ready for the correlation anaylysis. The LRRK2 samples were not included in the coexpression analysis because with only 2 and 3 patients respectively, it would results in rho values of 1/-1 or 1, 0.5, 0, -0.5 or 1, which are difficult to threshold. I also removed the MOR.FC dataset as the distribution of Rho values was abnormally distributed

[]

#### ALP####
#Read in network nodes
DEG_PPI <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/ALSPARK2LRRK2/ALP_PPIGenes.txt")

setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
#Extract PPI network genes from each dataset#####
LEW <- read.csv("LEWfilteredresult.csv")
rownames(LEW) <- LEW$Gene.Symbol
LEW <- LEW[,28:41]
LEW <- subset(LEW, rownames(LEW) %in% DEG_PPI)

MID3 <- read.csv("MID3filteredresult.csv")
rownames(MID3) <- MID3$Gene.Symbol
MID3 <- MID3[,36:49]
MID3 <- subset(MID3, rownames(MID3) %in% DEG_PPI)

MID4 <- read.csv("MID4filteredresult.csv")
rownames(MID4) <- MID4$Gene.Symbol
MID4 <- MID4[,41:55]
MID4 <- subset(MID4, rownames(MID4) %in% DEG_PPI)

MOR.FC <- read.csv("MOR.FCfilteredresult.csv")
rownames(MOR.FC) <- MOR.FC$Gene.Symbol
MOR.FC <- MOR.FC[,24:28]
MOR.FC <- subset(MOR.FC, rownames(MOR.FC) %in% DEG_PPI)

DIJ <- read.csv("DIJfilteredresult.csv")
rownames(DIJ) <- DIJ$Gene.Symbol
DIJ <- DIJ[,29:43]
DIJ <- subset(DIJ, rownames(DIJ) %in% DEG_PPI)

FFR <- read.csv("FFRfilteredresult.csv")
rownames(FFR) <- FFR$Gene.Symbol
FFR <- FFR[,29:44]
FFR <- subset(FFR, rownames(FFR) %in% DEG_PPI)

MID1 <- read.csv("MID1filteredresult.csv")
rownames(MID1) <- MID1$Gene.Symbol
MID1 <- MID1[,28:37]
MID1 <- subset(MID1, rownames(MID1) %in% DEG_PPI)

MID2 <- read.csv("MID2filteredresult.csv")
rownames(MID2) <- MID2$Gene.Symbol
MID2 <- MID2[,39:49]
MID2 <- subset(MID2, rownames(MID2) %in% DEG_PPI)

MOR.SN <- read.csv("MOR.SNfilteredresult.csv")
rownames(MOR.SN) <- MOR.SN$Gene.Symbol
MOR.SN <- MOR.SN[,36:59]
MOR.SN <- subset(MOR.SN, rownames(MOR.SN) %in% DEG_PPI)

DUM <- read.csv("DUM_UniqueGene_DESeq2.csv")
rownames(DUM) <- DUM$hgnc_symbol
DUM <- DUM[,53:81]
DUM <- subset(DUM, rownames(DUM) %in% DEG_PPI)




#Find the gene names that all datasets have in common
DEG_com <- Reduce(intersect, list(rownames(DIJ), rownames(FFR), 
                                  rownames(LEW),rownames(MID1),rownames(MID2),
                                  rownames(MID3),rownames(MID4),rownames(MOR.FC),
                                  rownames(MOR.SN), rownames(DUM)))

This script resulted in 854 common names. Correlation analysis was conducted on sharc (/shared/hidelab2/user/mdp15cmg/Parkinsons/PD_Cor)

When the correlation relationships were downloaded and analysed, only 59 edges reached a consistent correlation value of over 0.1 or below -0.1 regardless of sign. This increases to 208 when the RNA-seq dataset is removed.

Yet Another New Approach

Talking to Kat we hypothesised that the Parkin Fibroblasts were perhaps driving away the signal too strongly. Also the number of samples was so small that it’s not necessarily giving a representative signal.

To compensate, I decided to go back to the blood samples and use some other samples from the AMA GSE99039 dataset which have familial mutations. There were 5 ATP13A2 cases, 15 Parkin cases, and 12 PINK1 cases. Two of the Parkin cases in the metadata didn’t have corresponding files, so this became 13 Parkin cases.

Limma

##### AMA ATP13A2 #########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/GSE99039_keepfiles/")

#run program to choose .CEL files from directory
celfiles <- fileBrowser(textToShow = "Choose CEL files", testFun = hasSuffix("[cC][eE][lL]"))
Data<-ReadAffy(filenames=celfiles) #read in files
rmaEset<-rma(Data) #normalise using RMA
analysis.name<-"AMA_ATP13A2" #Label analysis
dataMatrixAll<-exprs(rmaEset) #takes expression from normalised expression set


#mas5call generates presence/absence calls for each probeset
mas5call<-mas5calls(Data)
callMatrixAll<-exprs(mas5call)
colnames(callMatrixAll)<-sub(".CEL", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
colnames(callMatrixAll)<-sub(".cel", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
callMatrixAll<-as.data.frame(callMatrixAll)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPf<-function(x){
  sum(x=="P")
}

#count how many samples have presence calls
countPl<-apply(callMatrixAll, 1, countPf)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPdf<-data.frame(ProbeSetID=names(countPl), countP=countPl) 

#read annotation
# USING ANNOTATION FILE (if .csv, convert to .txt using excel)
annotation.file<-"/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Data/HG-U133_Plus_2.na35.annot.csv/HG-U133_Plus_2.na35_SHORT.annot.txt"
annotation<-read.table(annotation.file, header = TRUE, row.names=NULL, sep="\t", skip=0, stringsAsFactors=F, quote = "", comment.char="!", fill = TRUE, as.is = TRUE)
dim(annotation)
nrow(annotation)
annotation<-subset( annotation, subset=(Gene.Symbol !="---")) #if no gene symbol, discount

# Remove rows in which genes are noted to have negative strand matching probes
idxNegativeStrand<-grep("Negative Strand Matching Probes", annotation$Annotation.Notes)
if(length(idxNegativeStrand)>0)
{
  annotation<-annotation[-idxNegativeStrand,]
}


expressionMatrix<-exprs(rmaEset)
colnames(expressionMatrix)

#this is for matched samples
Treat<-factor(rep(c("Control", "Patient"),c(183,5)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design

#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit) 
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))

result$"ProbeSetID"<-rownames(result) #make probeset IDs the row names
head(result$"ProbeSetID") 
result$"Fold Change"<-2^result$logFC 
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by.x="ProbeSetID", by.y="ProbeSetID") #merge values into one array
result<-merge(annotation, result, by.x="Probe.Set.ID", by.y="ProbeSetID")
result<-merge(result, countPdf, by.x="Probe.Set.ID", by.y="ProbeSetID")
result$Gene.Symbol <- sapply(strsplit(result$Gene.Symbol,"///"), `[`, 1)
result$Ensembl <- sapply(strsplit(result$Ensembl,"///"), `[`, 1)

setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)

genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,5]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)

uniqueresult<-subset(uniqueresult, Gene.Symbol!="") #removes any probes for which there are no gene symbols
uniqueresult<-subset(uniqueresult, subset=(countP>20)) #only takes results that have at least 2 samples with a presence call for a probe
write.csv(uniqueresult, file=paste(analysis.name, "filteredresult.csv", sep=""), row.names=FALSE, quote = FALSE)

##### AMA PINK1 ##########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/GSE99039_keepfiles/")

#run program to choose .CEL files from directory
celfiles <- fileBrowser(textToShow = "Choose CEL files", testFun = hasSuffix("[cC][eE][lL]"))
Data<-ReadAffy(filenames=celfiles) #read in files
rmaEset<-rma(Data) #normalise using RMA
analysis.name<-"AMA_PINK1" #Label analysis
dataMatrixAll<-exprs(rmaEset) #takes expression from normalised expression set


#mas5call generates presence/absence calls for each probeset
mas5call<-mas5calls(Data)
callMatrixAll<-exprs(mas5call)
colnames(callMatrixAll)<-sub(".CEL", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
colnames(callMatrixAll)<-sub(".cel", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
callMatrixAll<-as.data.frame(callMatrixAll)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPf<-function(x){
  sum(x=="P")
}

#count how many samples have presence calls
countPl<-apply(callMatrixAll, 1, countPf)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPdf<-data.frame(ProbeSetID=names(countPl), countP=countPl) 

#read annotation
# USING ANNOTATION FILE (if .csv, convert to .txt using excel)
annotation.file<-"/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Data/HG-U133_Plus_2.na35.annot.csv/HG-U133_Plus_2.na35_SHORT.annot.txt"
annotation<-read.table(annotation.file, header = TRUE, row.names=NULL, sep="\t", skip=0, stringsAsFactors=F, quote = "", comment.char="!", fill = TRUE, as.is = TRUE)
dim(annotation)
nrow(annotation)
annotation<-subset( annotation, subset=(Gene.Symbol !="---")) #if no gene symbol, discount

# Remove rows in which genes are noted to have negative strand matching probes
idxNegativeStrand<-grep("Negative Strand Matching Probes", annotation$Annotation.Notes)
if(length(idxNegativeStrand)>0)
{
  annotation<-annotation[-idxNegativeStrand,]
}


expressionMatrix<-exprs(rmaEset)
colnames(expressionMatrix)

#this is for matched samples
Treat<-factor(rep(c("Control", "Patient"),c(183,12)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design

#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit) 
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))

result$"ProbeSetID"<-rownames(result) #make probeset IDs the row names
head(result$"ProbeSetID") 
result$"Fold Change"<-2^result$logFC 
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by.x="ProbeSetID", by.y="ProbeSetID") #merge values into one array
result<-merge(annotation, result, by.x="Probe.Set.ID", by.y="ProbeSetID")
result<-merge(result, countPdf, by.x="Probe.Set.ID", by.y="ProbeSetID")
result$Gene.Symbol <- sapply(strsplit(result$Gene.Symbol,"///"), `[`, 1)
result$Ensembl <- sapply(strsplit(result$Ensembl,"///"), `[`, 1)

setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)

genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,5]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)

uniqueresult<-subset(uniqueresult, Gene.Symbol!="") #removes any probes for which there are no gene symbols
uniqueresult<-subset(uniqueresult, subset=(countP>20)) #only takes results that have at least 2 samples with a presence call for a probe
write.csv(uniqueresult, file=paste(analysis.name, "filteredresult.csv", sep=""), row.names=FALSE, quote = FALSE)

##### AMA PARKIN ##########
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Data/GSE99039_keepfiles/")

#run program to choose .CEL files from directory
celfiles <- fileBrowser(textToShow = "Choose CEL files", testFun = hasSuffix("[cC][eE][lL]"))
Data<-ReadAffy(filenames=celfiles) #read in files
rmaEset<-rma(Data) #normalise using RMA
analysis.name<-"AMA_PARKIN" #Label analysis
dataMatrixAll<-exprs(rmaEset) #takes expression from normalised expression set


#mas5call generates presence/absence calls for each probeset
mas5call<-mas5calls(Data)
callMatrixAll<-exprs(mas5call)
colnames(callMatrixAll)<-sub(".CEL", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
colnames(callMatrixAll)<-sub(".cel", ".mas5-Detection", colnames(callMatrixAll),fixed=TRUE)
callMatrixAll<-as.data.frame(callMatrixAll)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPf<-function(x){
  sum(x=="P")
}

#count how many samples have presence calls
countPl<-apply(callMatrixAll, 1, countPf)
callMatrixAll$ProbeSetID<-rownames(callMatrixAll)
countPdf<-data.frame(ProbeSetID=names(countPl), countP=countPl) 

#read annotation
# USING ANNOTATION FILE (if .csv, convert to .txt using excel)
annotation.file<-"/Users/clairegreen/Documents/PhD/TDP-43/TDP-43_Data/HG-U133_Plus_2.na35.annot.csv/HG-U133_Plus_2.na35_SHORT.annot.txt"
annotation<-read.table(annotation.file, header = TRUE, row.names=NULL, sep="\t", skip=0, stringsAsFactors=F, quote = "", comment.char="!", fill = TRUE, as.is = TRUE)
dim(annotation)
nrow(annotation)
annotation<-subset( annotation, subset=(Gene.Symbol !="---")) #if no gene symbol, discount

# Remove rows in which genes are noted to have negative strand matching probes
idxNegativeStrand<-grep("Negative Strand Matching Probes", annotation$Annotation.Notes)
if(length(idxNegativeStrand)>0)
{
  annotation<-annotation[-idxNegativeStrand,]
}


expressionMatrix<-exprs(rmaEset)
colnames(expressionMatrix)

#this is for matched samples
Treat<-factor(rep(c("Control", "Patient"),c(183,13)), levels=c("Control", "Patient"))
design<-model.matrix(~Treat)
rownames(design)<-colnames(expressionMatrix)
design

#Conduct statistical analysis of expression
library(limma)
fit<-lmFit(expressionMatrix, design) #linear model fit
fit<-eBayes(fit) 
result<-topTable(fit, coef="TreatPatient", adjust="BH", number=nrow(expressionMatrix)) #"BH" adjust for multiple hypothesis testing
#toptable normally takes top number but this takes all
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression")
write.csv(result, file = paste(analysis.name, "_result.csv", sep = ""))

result$"ProbeSetID"<-rownames(result) #make probeset IDs the row names
head(result$"ProbeSetID") 
result$"Fold Change"<-2^result$logFC 
result$"Fold Change"[result$"Fold Change"<1]<-(-1)/result$"Fold Change"[result$"Fold Change"<1] #converts log fold change into a linear value above or below 0
expressionLinear<-as.data.frame(2^expressionMatrix)
expressionLinear$ProbeSetID<-rownames(expressionLinear)
result<-merge(result, expressionLinear, by.x="ProbeSetID", by.y="ProbeSetID") #merge values into one array
result<-merge(annotation, result, by.x="Probe.Set.ID", by.y="ProbeSetID")
result<-merge(result, countPdf, by.x="Probe.Set.ID", by.y="ProbeSetID")
result$Gene.Symbol <- sapply(strsplit(result$Gene.Symbol,"///"), `[`, 1)
result$Ensembl <- sapply(strsplit(result$Ensembl,"///"), `[`, 1)

setwd(dir = "/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
write.csv(result, file=paste(analysis.name, "_finalresult.csv", sep=""), row.names=FALSE, quote = FALSE)

genesort <- result[order(result$P.Value),]
uniqueresult <- genesort[!duplicated(genesort[,5]),]
write.csv(uniqueresult, file=paste(analysis.name, "rankeduniqueresult.csv", sep=""), row.names=FALSE, quote = FALSE)

uniqueresult<-subset(uniqueresult, Gene.Symbol!="") #removes any probes for which there are no gene symbols
uniqueresult<-subset(uniqueresult, subset=(countP>20)) #only takes results that have at least 2 samples with a presence call for a probe
write.csv(uniqueresult, file=paste(analysis.name, "filteredresult.csv", sep=""), row.names=FALSE, quote = FALSE)

Common Differentially Expressed Genes (Blood Edition)

#############################################################
################### FOLD CHANGE DEG #########################
#############################################################

setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")

LEW <- read.csv("LEWfilteredresult.csv")
LEW <- LEW[order(LEW$P.Value),]
MID3 <- read.csv("MID3filteredresult.csv")
MID3 <- MID3[order(MID3$P.Value),]
MID4 <- read.csv("MID4filteredresult.csv")
MID4 <- MID4[order(MID4$P.Value),]
MOR.FC <- read.csv("MOR.FCfilteredresult.csv")
MOR.FC <- MOR.FC[order(MOR.FC$P.Value),]
DIJ <- read.csv("DIJfilteredresult.csv")
DIJ <- DIJ[order(DIJ$P.Value),]
FFR <- read.csv("FFRfilteredresult.csv")
FFR <- FFR[order(FFR$P.Value),]
MID1 <- read.csv("MID1filteredresult.csv")
MID1 <- MID1[order(MID1$P.Value),]
MID2 <- read.csv("MID2filteredresult.csv")
MID2 <- MID2[order(MID2$P.Value),]
MOR.SN <- read.csv("MOR.SNfilteredresult.csv")
MOR.SN <- MOR.SN[order(MOR.SN$P.Value),]
DUM <- read.csv("DUM_UniqueGene_DESeq2.csv")
DUM <- DUM[order(DUM$pvalue),]
BOT <- read.csv("BOTrankeduniqueresult.csv")
BOT <- BOT[order(BOT$P.Value),]
BOT2 <- read.csv("BOT2rankeduniqueresult.csv")
BOT2 <- BOT2[order(BOT2$P.Value),]




thresh <- 1

upLEW <- subset(LEW, LEW$Fold.Change >= thresh)
upLEWgene <- upLEW$Gene.Symbol

upMID3<- subset(MID3, MID3$Fold.Change >= thresh)
upMID3gene <- upMID3$Gene.Symbol

upMID4 <- subset(MID4, MID4$Fold.Change >= thresh)
upMID4gene <- upMID4$Gene.Symbol

upMOR.FC <- subset(MOR.FC, MOR.FC$Fold.Change >= thresh)
upMOR.FCgene <- upMOR.FC$Gene.Symbol

upDUM <- subset(DUM, DUM$log2FoldChange >= 0)
upDUMgene <- upDUM$hgnc_symbol

upDIJ <- subset(DIJ, DIJ$Fold.Change >= thresh)
upDIJgene <- upDIJ$Gene.Symbol

upFFR<- subset(FFR, FFR$Fold.Change >= thresh)
upFFRgene <- upFFR$Gene.Symbol

upMID1<- subset(MID1, MID1$Fold.Change >= thresh)
upMID1gene <- upMID1$Gene.Symbol

upMID2 <- subset(MID2, MID2$Fold.Change >= thresh)
upMID2gene <- upMID2$Gene.Symbol

upMOR.SN <- subset(MOR.SN, MOR.SN$Fold.Change >= thresh)
upMOR.SNgene <- upMOR.SN$Gene.Symbol

upBOT <- subset(BOT, BOT$Fold.Change >= thresh)
upBOTgene <- upBOT$Gene.Symbol

upBOT2 <- subset(BOT2, BOT2$Fold.Change >= thresh)
upBOT2gene <- upBOT2$Gene.Symbol


INTUP <- Reduce(intersect, list(upLEWgene, upMID3gene, upMID4gene, upMOR.FCgene,
                                upDIJgene, upFFRgene, upMID1gene, upMID2gene, upMOR.SNgene, upDUMgene, upBOTgene, upBOT2gene))



#### DOWN ####
thresh <- -1

downLEW <- subset(LEW, LEW$Fold.Change <= thresh)
downLEWgene <- downLEW$Gene.Symbol

downMID3 <- subset(MID3, MID3$Fold.Change <= thresh)
downMID3gene <- downMID3$Gene.Symbol

downMID4 <- subset(MID4, MID4$Fold.Change <= thresh)
downMID4gene <- downMID4$Gene.Symbol

downMOR.FC <- subset(MOR.FC, MOR.FC$Fold.Change <= thresh)
downMOR.FCgene <- downMOR.FC$Gene.Symbol

downDUM <- subset(DUM, DUM$log2FoldChange <= 0)
downDUMgene <- downDUM$hgnc_symbol

downDIJ <- subset(DIJ, DIJ$Fold.Change <= thresh)
downDIJgene <- downDIJ$Gene.Symbol

downFFR<- subset(FFR, FFR$Fold.Change <= thresh)
downFFRgene <- downFFR$Gene.Symbol

downMID1<- subset(MID1, MID1$Fold.Change <= thresh)
downMID1gene <- downMID1$Gene.Symbol

downMID2 <- subset(MID2, MID2$Fold.Change <= thresh)
downMID2gene <- downMID2$Gene.Symbol

downMOR.SN <- subset(MOR.SN, MOR.SN$Fold.Change <= thresh)
downMOR.SNgene <- downMOR.SN$Gene.Symbol


downBOT <- subset(BOT, BOT$Fold.Change <= thresh)
downBOTgene <- downBOT$Gene.Symbol

downBOT2 <- subset(BOT2, BOT2$Fold.Change <= thresh)
downBOT2gene <- downBOT2$Gene.Symbol

INTDOWN <- Reduce(intersect, list(downLEWgene, downMID3gene, downMID4gene, downMOR.FCgene,
                                  downDIJgene, downFFRgene, downMID1gene, downMID2gene, 
                                  downMOR.SNgene, downDUMgene, downBOTgene, downBOT2gene))




########################### COMMON GENES ##############################
all <- c(INTUP, INTDOWN)
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/FoldChange")
# write.table(INTUP,"Sp_LRRK2_upDEGs.txt", row.names = F, col.names = F, quote = F)
# write.table(INTDOWN,"Sp_LRRK2_downDEGs.txt", row.names = F, col.names = F, quote = F)
# write.table(all, "Sp_LRRK2_allDEGs.txt", row.names = F, col.names = F, quote = F)

PDgenes <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/ParkinsonsDiseaseMalacards.txt")

intersect(INTUP, PDgenes)
intersect(INTDOWN, PDgenes)



####### ALS signature ##########

setwd("/users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/GeneExpression/noMedian/")

C9 <- read.csv("C9_unique.csv")
C9 <- C9[order(C9$P.Value),]
sals <- read.csv("sals_unique.csv")
sals <- sals[order(sals$P.Value),]

setwd("/users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/GeneExpression/TDP-43_DEseq2/")

pet <- read.csv("PET_results_keepfiltering.csv")
rav <- read.csv("RAV_results_keepfiltering.csv")

setwd("/users/clairegreen/Documents/PhD/TDP-43/TDP-43_Code/Results/GeneExpression/non-TDP/")

FUS <- read.csv("FUSrankeduniqueresult.csv")
FUS <- FUS[order(FUS$P.Value),]
SOD1 <- read.csv("SOD1rankeduniqueresult.csv")
SOD1 <- SOD1[order(SOD1$P.Value),]


thresh <- 1

upC9 <- subset(C9, C9$Fold.Change >= thresh)
upC9gene <- upC9$Gene.Symbol

upSALS <- subset(sals, sals$Fold.Change >= thresh)
upSALSgene <- upSALS$Gene.Symbol

upPET <- subset(pet, pet$FoldChange >= thresh)
upPETgene <- upPET$hgnc_symbol

upRAV <- subset(rav, rav$FoldChange >= thresh)
upRAVgene <- upRAV$hgnc_symbol

upFUS <- subset(FUS, FUS$Fold.Change >= thresh)
upFUSgene <- upFUS$Gene.Symbol

upSOD1 <- subset(SOD1, SOD1$Fold.Change >= thresh)
upSOD1gene <- upSOD1$Gene.Symbol

INTUP_ALS <- Reduce(intersect, list(upC9gene, upSALSgene, upPETgene, upRAVgene, upFUSgene, upSOD1gene))


#### DOWN 
thresh <- -1

downC9 <- subset(C9, C9$Fold.Change <= thresh)
downC9gene <- downC9$Gene.Symbol

downSALS <- subset(sals, sals$Fold.Change <= thresh)
downSALSgene <- downSALS$Gene.Symbol

downPET <- subset(pet, pet$FoldChange <= thresh)
downPETgene <- downPET$hgnc_symbol

downRAV <- subset(rav, rav$FoldChange <= thresh)
downRAVgene <- downRAV$hgnc_symbol

downFUS <- subset(FUS, FUS$Fold.Change <= thresh)
downFUSgene <- downFUS$Gene.Symbol

downSOD1 <- subset(SOD1, SOD1$Fold.Change <= thresh)
downSOD1gene <- downSOD1$Gene.Symbol

INTDOWN_ALS <- Reduce(intersect, list(downC9gene, downSALSgene, downPETgene, downRAVgene, downFUSgene, downSOD1gene))



##### COMMON GENES ###
upremove <- Reduce(intersect, list (INTUP, INTUP_ALS))
downremove <- Reduce(intersect, list(INTDOWN, INTDOWN_ALS))



###### REMOVE COMMON GENES ###
resultsup <- subset(INTUP, !(INTUP %in% upremove))
resultsdown <- subset(INTDOWN, !(INTDOWN %in% downremove))
results <- c(resultsup, resultsdown)

####### sPD Blood signature ##########

setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")

AMA <- read.csv("AMAfilteredresult.csv")
AMA <- AMA[order(AMA$P.Value),]
RON <- read.csv("RONfilteredresult.csv")
RON <- RON[order(RON$P.Value),]


thresh <- 1

upAMA <- subset(AMA, AMA$Fold.Change >= thresh)
upAMAgene <- upAMA$Gene.Symbol

upRON <- subset(RON, RON$Fold.Change >= thresh)
upRONgene <- upRON$Gene.Symbol


INTUP_blood <- Reduce(intersect, list(upAMAgene, upRONgene))


#### DOWN ###
thresh <- -1

downAMA <- subset(AMA, AMA$Fold.Change <= thresh)
downAMAgene <- downAMA$Gene.Symbol

downRON <- subset(RON, RON$Fold.Change <= thresh)
downRONgene <- downRON$Gene.Symbol


INTDOWN_blood <- Reduce(intersect, list(downAMAgene, downRONgene))



##### COMMON GENES ###
upremove2 <- Reduce(intersect, list(INTUP, INTUP_blood))
downremove2 <- Reduce(intersect, list(INTDOWN, INTDOWN_blood))



##### REMOVE COMMON GENES ###
resultsup <- subset(resultsup, !(resultsup %in% upremove2))
resultsdown <- subset(resultsdown, !(resultsdown %in% downremove2))
results <- c(resultsup, resultsdown)




setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/FoldChange/")
# write.table(resultsup, "Sp_LRRK2_FiltALSPDblood_UPgenes.txt", quote = F, row.names = F, col.names = F)
# write.table(resultsdown, "Sp_LRRK2_FiltALSPDblood_DOWNgenes.txt", quote = F, row.names = F, col.names = F)
# write.table(results, "Sp_LRRK2_FiltALSPDblood_ALLgenes.txt", quote = F, row.names = F, col.names = F)
# cat(resultsup, sep="\n")

intersect(resultsup, PDgenes)
intersect(resultsdown, PDgenes)


####### fPD Blood signature ##########

setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")

AMA_A <- read.csv("AMA_ATP13A2filteredresult.csv")
AMA_A <- AMA_A[order(AMA_A$P.Value),]
AMA_PARK <- read.csv("AMA_PARKINfilteredresult.csv")
AMA_PARK <- AMA_PARK[order(AMA_PARK$P.Value),]
AMA_PINK <- read.csv("AMA_PINK1filteredresult.csv")
AMA_PINK <- AMA_PINK[order(AMA_PINK$P.Value),]


thresh <- 1

upAMA_A <- subset(AMA_A, AMA_A$Fold.Change >= thresh)
upAMA_Agene <- AMA_A$Gene.Symbol

upAMA_PARK<- subset(AMA_PARK, AMA_PARK$Fold.Change >= thresh)
upAMA_PARKgene <- upAMA_PARK$Gene.Symbol

upAMA_PINK<- subset(AMA_PINK, AMA_PINK$Fold.Change >= thresh)
upAMA_PINKgene <- upAMA_PINK$Gene.Symbol


INTUP_fam_blood <- Reduce(intersect, list(upAMA_Agene, upAMA_PARKgene, upAMA_PINKgene))


#### DOWN ###
thresh <- -1

downAMA_A <- subset(AMA_A, AMA_A$Fold.Change <= thresh)
downAMA_Agene <- AMA_A$Gene.Symbol

downAMA_PARK<- subset(AMA_PARK, AMA_PARK$Fold.Change <= thresh)
downAMA_PARKgene <- downAMA_PARK$Gene.Symbol

downAMA_PINK<- subset(AMA_PINK, AMA_PINK$Fold.Change <= thresh)
downAMA_PINKgene <- downAMA_PINK$Gene.Symbol


INTDOWN_fam_blood <- Reduce(intersect, list(downAMA_Agene, downAMA_PARKgene, downAMA_PINKgene))



##### COMMON GENES ###
upremove3 <- Reduce(intersect, list(INTUP, INTUP_fam_blood))
downremove3 <- Reduce(intersect, list(INTDOWN, INTDOWN_fam_blood))



##### REMOVE COMMON GENES ###
resultsup <- subset(resultsup, !(resultsup %in% upremove3))
resultsdown <- subset(resultsdown, !(resultsdown %in% downremove3))
results <- c(resultsup, resultsdown)




setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")
write.table(resultsup, "ALS_sfblood_UPgenes.txt", quote = F, row.names = F, col.names = F)
write.table(resultsdown, "ALS_sfblood_DOWNgenes.txt", quote = F, row.names = F, col.names = F)
write.table(results, "ALS_sfblood_ALLgenes.txt", quote = F, row.names = F, col.names = F)
cat(resultsup, sep="\n")

intersect(resultsup, PDgenes)
intersect(resultsdown, PDgenes)

This produced 170 commonly differentially expressed genes (ALS_sfblood_ALLgenes.txt). These genes included Malacards SCNA, MSX1, UCHL1, ALDH1A1 .

Protein Interaction Expansion

The 170 gene seed created a network of 2159 proteins (/familialblood/PPIgenes). EnrichR (http://amp.pharm.mssm.edu/Enrichr/enrich?dataset=3ztlg) shows high enrichment for a number of processes, making it hard to know exactly what the geneset represents. However for GO Cellular component the highest enrichments are still for the mitochondria

In this case, I am somewhat confident that I am looking in the right area. The only way we will know if it is, however, is to look at whether there is common coexpression

Common Coexpression in LRRK2/Sporadic PD

#### Parkinson's DEG PPI Correlation ####

#Read in network nodes

#### Familial Blood ####
#Read in network nodes
DEG_PPI <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/PPIGenes.txt")

setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/GeneExpression/")
#Extract PPI network genes from each dataset#####
LEW <- read.csv("LEWfilteredresult.csv")
rownames(LEW) <- LEW$Gene.Symbol
LEW <- LEW[,28:41]
LEW <- subset(LEW, rownames(LEW) %in% DEG_PPI)

MID3 <- read.csv("MID3filteredresult.csv")
rownames(MID3) <- MID3$Gene.Symbol
MID3 <- MID3[,36:49]
MID3 <- subset(MID3, rownames(MID3) %in% DEG_PPI)

MID4 <- read.csv("MID4filteredresult.csv")
rownames(MID4) <- MID4$Gene.Symbol
MID4 <- MID4[,41:55]
MID4 <- subset(MID4, rownames(MID4) %in% DEG_PPI)

MOR.FC <- read.csv("MOR.FCfilteredresult.csv")
rownames(MOR.FC) <- MOR.FC$Gene.Symbol
MOR.FC <- MOR.FC[,24:28]
MOR.FC <- subset(MOR.FC, rownames(MOR.FC) %in% DEG_PPI)

DIJ <- read.csv("DIJfilteredresult.csv")
rownames(DIJ) <- DIJ$Gene.Symbol
DIJ <- DIJ[,29:43]
DIJ <- subset(DIJ, rownames(DIJ) %in% DEG_PPI)

FFR <- read.csv("FFRfilteredresult.csv")
rownames(FFR) <- FFR$Gene.Symbol
FFR <- FFR[,29:44]
FFR <- subset(FFR, rownames(FFR) %in% DEG_PPI)

MID1 <- read.csv("MID1filteredresult.csv")
rownames(MID1) <- MID1$Gene.Symbol
MID1 <- MID1[,28:37]
MID1 <- subset(MID1, rownames(MID1) %in% DEG_PPI)

MID2 <- read.csv("MID2filteredresult.csv")
rownames(MID2) <- MID2$Gene.Symbol
MID2 <- MID2[,39:49]
MID2 <- subset(MID2, rownames(MID2) %in% DEG_PPI)

MOR.SN <- read.csv("MOR.SNfilteredresult.csv")
rownames(MOR.SN) <- MOR.SN$Gene.Symbol
MOR.SN <- MOR.SN[,36:59]
MOR.SN <- subset(MOR.SN, rownames(MOR.SN) %in% DEG_PPI)

DUM <- read.csv("DUM_UniqueGene_DESeq2.csv")
rownames(DUM) <- DUM$hgnc_symbol
DUM <- DUM[,53:81]
DUM <- subset(DUM, rownames(DUM) %in% DEG_PPI)




#Find the gene names that all datasets have in common
DEG_com <- Reduce(intersect, list(rownames(DIJ), rownames(FFR), 
                                  rownames(LEW),rownames(MID1),rownames(MID2),
                                  rownames(MID3),rownames(MID4),rownames(MOR.FC),
                                  rownames(MOR.SN), rownames(DUM)))

#Subset each dataset with these common names so they are all the same size
DIJ <- subset(DIJ, rownames(DIJ) %in% DEG_com)
DUM <- subset(DUM, rownames(DUM) %in% DEG_com)
FFR <- subset(FFR, rownames(FFR) %in% DEG_com)
LEW <- subset(LEW, rownames(LEW) %in% DEG_com)
MID1 <- subset(MID1, rownames(MID1) %in% DEG_com)
MID2 <- subset(MID2, rownames(MID2) %in% DEG_com)
MID3 <- subset(MID3, rownames(MID3) %in% DEG_com)
MID4 <- subset(MID4, rownames(MID4) %in% DEG_com)
MOR.FC <- subset(MOR.FC, rownames(MOR.FC) %in% DEG_com)
MOR.SN <- subset(MOR.SN, rownames(MOR.SN) %in% DEG_com)

setwd("/users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood")

write.csv(DIJ, "DIJ_DEGfilt.csv")
write.csv(DUM, "DUM_DEGfilt.csv")
write.csv(FFR, "FFR_DEGfilt.csv")
write.csv(LEW, "LEW_DEGfilt.csv")
write.csv(MID1, "MID1_DEGfilt.csv")
write.csv(MID2, "MID2_DEGfilt.csv")
write.csv(MID3, "MID3_DEGfilt.csv")
write.csv(MID4, "MID4_DEGfilt.csv")
write.csv(MOR.FC, "MOR.FC_DEGfilt.csv")
write.csv(MOR.SN, "MOR.SN_DEGfilt.csv")

#Run Correlation analysis on Sharc


#### filter correlations #### BOT/BOT2 not included due to less than 4 samples
setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")

DIJ <- read.csv("DIJcorresult.csv")
DIJ$Gene1 <- as.character(lapply(strsplit(as.character(DIJ$X), "\\:"), "[", 2))
DIJ$Gene2 <- as.character(lapply(strsplit(as.character(DIJ$X), "\\:"), "[", 1))
DIJ <- DIJ[,c(5,6,1,2,3,4)]

DUM <- read.csv("DUMcorresult.csv")
DUM$Gene1 <- as.character(lapply(strsplit(as.character(DUM$X), "\\:"), "[", 2))
DUM$Gene2 <- as.character(lapply(strsplit(as.character(DUM$X), "\\:"), "[", 1))
DUM <- DUM[,c(5,6,1,2,3,4)]
DUM$X <- paste(DUM$Gene1,":",DUM$Gene2, sep = "")

FFR <- read.csv("FFRcorresult.csv")
FFR$Gene1 <- as.character(lapply(strsplit(as.character(FFR$X), "\\:"), "[", 2))
FFR$Gene2 <- as.character(lapply(strsplit(as.character(FFR$X), "\\:"), "[", 1))
FFR <- FFR[,c(5,6,1,2,3,4)]
FFR$X <- paste(FFR$Gene1,":",FFR$Gene2, sep = "")

LEW <- read.csv("LEWcorresult.csv")
LEW$Gene1 <- as.character(lapply(strsplit(as.character(LEW$X), "\\:"), "[", 2))
LEW$Gene2 <- as.character(lapply(strsplit(as.character(LEW$X), "\\:"), "[", 1))
LEW <- LEW[,c(5,6,1,2,3,4)]

MID1 <- read.csv("MID1corresult.csv")
MID1$Gene1 <- as.character(lapply(strsplit(as.character(MID1$X), "\\:"), "[", 2))
MID1$Gene2 <- as.character(lapply(strsplit(as.character(MID1$X), "\\:"), "[", 1))
MID1 <- MID1[,c(5,6,1,2,3,4)]
MID1$X <- paste(MID1$Gene1,":",MID1$Gene2, sep = "")

MID2 <- read.csv("MID2corresult.csv")
MID2$Gene1 <- as.character(lapply(strsplit(as.character(MID2$X), "\\:"), "[", 2))
MID2$Gene2 <- as.character(lapply(strsplit(as.character(MID2$X), "\\:"), "[", 1))
MID2 <- MID2[,c(5,6,1,2,3,4)]

MID3 <- read.csv("MID3corresult.csv")
MID3$Gene1 <- as.character(lapply(strsplit(as.character(MID3$X), "\\:"), "[", 2))
MID3$Gene2 <- as.character(lapply(strsplit(as.character(MID3$X), "\\:"), "[", 1))
MID3 <- MID3[,c(5,6,1,2,3,4)]

MID4 <- read.csv("MID4corresult.csv")
MID4$Gene1 <- as.character(lapply(strsplit(as.character(MID4$X), "\\:"), "[", 2))
MID4$Gene2 <- as.character(lapply(strsplit(as.character(MID4$X), "\\:"), "[", 1))
MID4 <- MID4[,c(5,6,1,2,3,4)]
MID4$X <- paste(MID4$Gene1,":",MID4$Gene2, sep = "")

MOR.FC<- read.csv("MOR.FCcorresult.csv")
MOR.FC$Gene1 <- as.character(lapply(strsplit(as.character(MOR.FC$X), "\\:"), "[", 2))
MOR.FC$Gene2 <- as.character(lapply(strsplit(as.character(MOR.FC$X), "\\:"), "[", 1))
MOR.FC <- MOR.FC[,c(5,6,1,2,3,4)]

MOR.SN<- read.csv("MOR.SNcorresult.csv")
MOR.SN$Gene1 <- as.character(lapply(strsplit(as.character(MOR.SN$X), "\\:"), "[", 2))
MOR.SN$Gene2 <- as.character(lapply(strsplit(as.character(MOR.SN$X), "\\:"), "[", 1))
MOR.SN <- MOR.SN[,c(5,6,1,2,3,4)]

thresh <- 0.1
### Filter by r value
DIJ_cor.5 <- DIJ[DIJ$reg.mat > thresh | DIJ$reg.mat < -thresh,]
# DUM_cor.5 <- DUM[DUM$reg.mat > thresh | DUM$reg.mat < -thresh,]
FFR_cor.5 <- FFR[FFR$reg.mat > thresh | FFR$reg.mat < -thresh,]
LEW_cor.5 <- LEW[LEW$reg.mat > thresh | LEW$reg.mat < -thresh,]
MID1_cor.5 <- MID1[MID1$reg.mat > thresh | MID1$reg.mat < -thresh,]
MID2_cor.5 <- MID2[MID2$reg.mat > thresh | MID2$reg.mat < -thresh,]
MID3_cor.5 <- MID3[MID3$reg.mat > thresh | MID3$reg.mat < -thresh,]
MID4_cor.5 <- MID4[MID4$reg.mat > thresh | MID4$reg.mat < -thresh,]
# MOR.FC_cor.5 <- MOR.FC[MOR.FC$reg.mat > thresh | MOR.FC$reg.mat < -thresh,]
MOR.SN_cor.5 <- MOR.SN[MOR.SN$reg.mat > thresh | MOR.SN$reg.mat < -thresh,]


### Find matches 

Commonedge <- Reduce(intersect, list(DIJ_cor.5$X,FFR_cor.5$X,
                                     LEW_cor.5$X, MID1_cor.5$X, MID2_cor.5$X,
                                     MID3_cor.5$X, MID4_cor.5$X,
                                     MOR.SN_cor.5$X))


#Subset each dataset with these common names so they are all the same size
DIJ_CE <- subset(DIJ_cor.5, DIJ_cor.5$X %in% Commonedge)
DIJ_CE <- DIJ_CE[order(DIJ_CE$X),]
# DUM_CE <- subset(DUM_cor.5, DUM_cor.5$X %in% Commonedge)
# DUM_CE <- DUM_CE[order(DUM_CE$X),]
FFR_CE <- subset(FFR_cor.5, FFR_cor.5$X %in% Commonedge)
FFR_CE <- FFR_CE[order(FFR_CE$X),]
LEW_CE <- subset(LEW_cor.5, LEW_cor.5$X %in% Commonedge)
LEW_CE <- LEW_CE[order(LEW_CE$X),]
MID1_CE <- subset(MID1_cor.5, MID1_cor.5$X %in% Commonedge)
MID1_CE <- MID1_CE[order(MID1_CE$X),]
MID2_CE <- subset(MID2_cor.5, MID2_cor.5$X %in% Commonedge)
MID2_CE <- MID2_CE[order(MID2_CE$X),]
MID3_CE <- subset(MID3_cor.5, MID3_cor.5$X %in% Commonedge)
MID3_CE <- MID3_CE[order(MID3_CE$X),]
MID4_CE <- subset(MID4_cor.5, MID4_cor.5$X %in% Commonedge)
MID4_CE <- MID4_CE[order(MID4_CE$X),]
# MOR.FC_CE <- subset(MOR.FC_cor.5, MOR.FC_cor.5$X %in% Commonedge)
# MOR.FC_CE <- MOR.FC_CE[order(MOR.FC_CE$X),]
MOR.SN_CE <- subset(MOR.SN_cor.5, MOR.SN_cor.5$X %in% Commonedge)
MOR.SN_CE <- MOR.SN_CE[order(MOR.SN_CE$X),]



CommonGroup <- data.frame(row.names = DIJ_CE$X,
                          DIJ = DIJ_CE$reg.mat,
                          FFR = FFR_CE$reg.mat,
                          LEW = LEW_CE$reg.mat,
                          MID1 = MID1_CE$reg.mat,
                          MID2 = MID2_CE$reg.mat, 
                          MID3 = MID3_CE$reg.mat, 
                          MID4 = MID4_CE$reg.mat, 
                          MOR.SN = MOR.SN_CE$reg.mat)


CG_conserved_up <- CommonGroup[apply(CommonGroup, MARGIN = 1, function(x) all(x > 0)), ]
CG_conserved_down <- CommonGroup[apply(CommonGroup, MARGIN = 1, function(x) all(x < 0)), ]

setwd("/Users/clairegreen/Documents/PhD/Parkinsons/Parkinsons_Code/Results/FamilialBlood/")
CG_samedir <- rbind(CG_conserved_up, CG_conserved_down)
CG_samedir$corMean <- rowMeans(CG_samedir, na.rm = FALSE, dims = 1)
CG_samedir$Gene <- rownames(CG_samedir)
CG_samedir$Gene1 <- as.character(lapply(strsplit(as.character(CG_samedir$Gene), "\\:"), "[", 2))
CG_samedir$Gene2 <- as.character(lapply(strsplit(as.character(CG_samedir$Gene), "\\:"), "[", 1))

write.csv(CG_samedir, "PD_samedir_1.csv", quote = F, row.names = F)

nodetable <- read.csv("PD_samedir_1node.csv")
nuggenes <- as.character(nodetable$name)
PDgenes <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/ParkinsonsDiseaseMalacards.txt")
celltype <- read.csv("~/Documents/PhD/TDP-43/TDP-43_Code/Results/PPI_Network/Zhang_BrainCelltype_Markers_braingenes.csv")
DEG <- readLines("ALS_sfblood_ALLgenes.txt")
nalls <- readLines("/Users/clairegreen/Documents/PhD/Parkinsons/NallsPDGWAS.txt")
targetvalPD <- c("SIRT2", "DNM1L", "STMN1", "DNAJA3", "TBK1", "RTCA", "ANXA7", "DNAJC12", "RTN4", "ADSL", "MDH1","ATP6V1G2", 
                 "YWHAZ", "ETS1")

nodetable$PDMalacards <- nodetable$shared.name %in% PDgenes
nodetable$DEG <- nodetable$name %in% DEG
nodetable$targetvalPD <- nodetable$shared.name %in% targetvalPD
nodetable$targetvalPD <- nodetable$shared.name %in% nalls

nodetable_celltype <- merge(celltype, nodetable, by.x = "Gene.symbol",  by.y = "shared.name", all = T)
nodetable_celltype <- subset(nodetable_celltype, !(nodetable_celltype$SUID == "NA"))


nodetable_celltype$targetvalPD <- nodetable_celltype$Gene.symbol %in% targetvalPD
write.csv(nodetable_celltype, "ModifiedPDNodetable.csv", row.names = F)

This analysis didn’t go as well as the TDP-43 case. The correlations were much more variable meaning I had to reduce the thresholds from 0.5 to 0.1. Additionally, I removed four datasets - the LRRK2 datasets couldn’t be used because they only had 2 and 3 samples - correlation would never calculated properly. DUM was an RNA-seq dataset so I was losing annotation coverage, and MOR.FC showed a really odd rho value distribution…

Looking back this is probably because it only has 3 control samples.

Using these 8 datasets, with a threshold of 0.1, This generated a network containing 279 nodes and 292 edges. Within this network was one clear largest connected component of 208 nodes and 249 edges.

If I increase the threshold to 0.2, the resulting network is much smaller: